Speech Recognition

HOME	My Kids \| Magic \| Work

Speech Recognition

The Whimsical Conjuror

Sure, we can talk to computers. But can we communicate with computers?

It is a good bet that anyone reading this web-site has been exposed to the "Star Trek" vision of the future of computing. You know: not only is the computer of the future a vast bank of information, but it also handles that information in much the same way that a very very intelligent human being would. Recent incarnations of the series have included an almost-human android, and more recently, an indistinguishable-from-human holographic projection.

Apart from the fact that human beings are rendered dangerously superfluous in the company of such uber-beings, the advent of thinking machines represents a kind of Technotopia that people like Ray Kurzweil find compelling. Fun, sure. But inevitable? Let's consider.

The vast majority of educated people now take as fact the proposition that human beings are the product of undirected natural forces. Given that people (with all their complexity and potential for intelligence) came about with no intelligent input, is it not inevitable that machines -- with considerable intelligent input from humans -- will eventually "evolve" into something vastly superior? As Kurzweil states in the book Age of Spiritual Machines, "Evolution determined an answer to this problem in a few billion years. Weve made a good start in a few thousand years. We are likely to finish the job in a few more decades."

Without question, it is a tricky business to extrapolate back from observable evolution (a largely insignificant business) to the critical elements of the biological evolutionary process. (Don't take my word for it, read, for example, a slightly biased Darwin Wars.) With this in mind, it is not clear that it will be any easier to extrapolate forward to the critical elements of the technological evolutionary process. But let's look at two examples of technological evolution and consider the trajectory.

The first example is dear to the hearts of most Artificial Intelligence proponents: Chess. After Deep Blue beat Garry Kasparov in May 1997, predictions concerning the "singularity" (when machines over-reach humans with respect to intelligence) began to skyrocket. But the domain of Chess is simple and the rules of chess are few indeed. Moreover, considering the very few who have mastered the game, perhaps Chess is not an inspired choice as a test of intelligence. This brings us to our second example: Language.

Language is a task that nearly every small child masters with sufficient exposure. But decades of research have provided very few real breakthroughs in Automatic Speech Recognition (ASR) or Natural Language Understanding (NLU), both integral to the machine capture of language. Yes, it is true that computational advancements have enabled decades-old ASR algorithms to achieve real-time performance, but that performance is still vastly lagging human recognition (see, for example, the technical paper on Human Recognition [sorry, I do this stuff for a living]).

The situation in NLU research is even more bleak. After a decade of competitive research, the contestants for the Loebner Prize are no closer to the dangling carrot (follow the link to see for yourself how pathetic the best-of-the-best computer conversations sound, or visit Simon Laven's Chatterbots and take in the state-of-the-art demos). And while "ontologocial" research such as the Cyc Project promise a step up for NLU, it might be reasonable to expect even a modest demonstration of that promise after eighteen years of work.

Essentially, there is a "wall" that Artificial Intelligence can only dream about breaking through. On one side of the wall is phonetics; on the other is semantics. Whatever the task and whatever the medium, one can consider the "phonetics" of the matter to consist of the building blocks of information transfer. Examples of this generalized "phonetics" include phonemes (for language) and binary digits (i.e. bits) for electronic computation. The generalized "semantics" of the problem, on the other hand, represents the "meaning". In toy problems like chess, the semantics are trivial by comparison to the phonetics. For language, however, meaning is non-trivially grounded in experience. Attempts at Knowledge Representation may be sadly misguided. By "translating" semantics into phonetics, KR may be simply demonstrating what we do not know rather than what we do know. Let me suggest that it is quite possible that semantics do not exist in a compational mode!

But you don't have to take it from me; listen to Roger Penrose. Read the book The Emperor's New Mind, and then come back and argue with me ;-) Incidentally, science's inability to understand the mind (the laughably titled How the Mind Works notwithstanding) represents the true "Final Frontier". Perhaps there are some philosophical limitations to "understanding understanding." But then I've always enjoyed recursive logic, precisely because it is beyond our capacity to manage. I guess that is why I like "magic", too: it clearly demonstrates the fragility of our observations, our thoughts, and our analysis. Yes, it is true that our difficulties with recursive logic might be "illusions" (as are our "difficulties" with magic). But they might not be...